348

these three predictions, for example, if beta strand and helix but no loop region are pre­

dicted simultaneously by the three lower-level networks.

Further tricks additionally improve the predictions of this software. In particular, many

sequences with similar structure are automatically added to the question sequence (mul­

tiple alignment). Thus, this secondary structure prediction allows an accuracy of up to

80%. This is already very close to the theoretical optimum. The only way to become even

more accurate is to predict the three-dimensional structure at the same time.

Question 14.5

One software is MemBrain (https://www.membrain-­nn.de/index.htm; https://www.mem­

brain-­nn.de/).

Question 14.6

Please search the internet for deep learning and inform yourself. Helpful is also the page:

https://deeplearning.net/. For AlphaGo also on the Internet (https://deepmind.com/

research/alphago; https://www.youtube.com/watch?v=mzpW10DPHeQ).

Question 14.7

Classification models are used in bioinformatics for the classification between two catego­

ries (binary), for example for the diagnosis of a disease (sick/healthy). It is important here

to become familiar with a classification table (confusion matrix; TP, FP, FN, TN), but also

to look at the performance metrics (sensitivity, false positive rate, specificity, PPV, NPV,

accuracy, misclassification rate, prevalence, ROC, AUC) for evaluating a classification

model. Here it is also important to know what are, for example, differences between sen­

sitivity and PPV, but also between specificity and NPV. For example, let’s imagine: A

person gets a positive (negative) test result from a predictive test that has a sensitivity of

90%, specificity of 99%, a PPV of 80%, and a NPV of 99%. Here, the positive test result

could only be trusted 80% to actually be positive (sick) (20% false positive, so fortunately

healthy), whereas a negative test result could be trusted more to actually be healthy (1%

false negative, so actually sick). Most diagnostic testing procedures take this into account

and, in the case of a positive test result, carry out a second test to confirm the diagnosis

(e.g. mammography screening). On the other hand, a test should in any case be accurate

enough to identify a healthy person with a high probability (here it would be worse to send

home a supposedly healthy person [negative test result] who is in fact sick [false negative]

and thus does not get any helping therapy or infects other people with a virus [e.g.

COVID-19]). In addition, one should think about problems (little data, etc.) in creating a

classification model, but also what requirements a classification model should meet. To

build a predictive model, it is advisable to use a training and test dataset (splitting 80/20%)

and validate the model on at least one independent dataset to better evaluate the predic­

tive power.

20  Solutions to the Exercises